{
	"cells": [
		{
			"cell_type": "markdown",
			"metadata": {
				"editable": true,
				"trusted": true
			},
			"source": [
				"[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)\n",
				"\n",
				"# 36 - Distributing Calls on Glue Interactive sessions\n",
				"\n",
				"AWS SDK for pandas is pre-loaded into [AWS Glue interactive sessions](https://docs.aws.amazon.com/glue/latest/dg/is-using-ray.html) with Ray kernel, making it by far the easiest way to experiment with the library at scale."
			]
		},
		{
			"cell_type": "markdown",
			"metadata": {
				"editable": true,
				"trusted": true
			},
			"source": [
				"In AWS Glue Studio, choose `Notebook` to create an AWS Glue interactive session:\n",
				"\n",
				"![](_static/glue_is_create.png)\n",
				"\n",
				"Then select `Ray` as the kernel. The IAM role must trust the AWS Glue service principal.\n",
				"\n",
				"![](_static/glue_is_setup.png)\n",
				"\n",
				"Once the notebook is up and running you can import the library. You can install `awswrangler` and `modin` as additional dependencies."
			]
		},
		{
			"cell_type": "code",
			"execution_count": 16,
			"metadata": {
				"tags": [],
				"trusted": true,
				"vscode": {
					"languageId": "python"
				}
			},
			"outputs": [
				{
					"name": "stdout",
					"output_type": "stream",
					"text": [
						"Additional python modules to be included:\n",
						"awswrangler\n",
						"modin\n"
					]
				}
			],
			"source": [
				"%additional_python_modules awswrangler,modin"
			]
		},
		{
			"cell_type": "code",
			"execution_count": 1,
			"metadata": {
				"tags": [],
				"trusted": true,
				"vscode": {
					"languageId": "python"
				}
			},
			"outputs": [
				{
					"name": "stdout",
					"output_type": "stream",
					"text": [
						"Authenticating with environment variables and user-defined glue_role_arn: arn:aws:iam::463623607974:role/service-role/AmazonSageMakerServiceCatalogProductsGlueRole\n",
						"Trying to create a Glue session for the kernel.\n",
						"Worker Type: Z.2X\n",
						"Number of Workers: 5\n",
						"Session ID: 32566e82-34d2-4db7-adac-cbee573e20bf\n",
						"Job Type: glueray\n",
						"Applying the following default arguments:\n",
						"--glue_kernel_version 0.38.1\n",
						"--enable-glue-datacatalog true\n",
						"--auto-scaling-ray-min-workers 1\n",
						"--additional-python-modules awswrangler,modin\n",
						"Waiting for session 32566e82-34d2-4db7-adac-cbee573e20bf to get into ready status...\n",
						"Session 32566e82-34d2-4db7-adac-cbee573e20bf has been created.\n"
					]
				}
			],
			"source": [
				"import awswrangler as wr"
			]
		},
		{
			"cell_type": "code",
			"execution_count": 8,
			"metadata": {
				"tags": [],
				"trusted": true,
				"vscode": {
					"languageId": "python"
				}
			},
			"outputs": [],
			"source": [
				"df = wr.s3.read_parquet(path=\"s3://ursa-labs-taxi-data/2017/\")"
			]
		},
		{
			"cell_type": "code",
			"execution_count": 9,
			"metadata": {
				"tags": [],
				"trusted": true,
				"vscode": {
					"languageId": "python"
				}
			},
			"outputs": [
				{
					"name": "stdout",
					"output_type": "stream",
					"text": [
						"  vendor_id           pickup_at  ... improvement_surcharge  total_amount\n",
						"0         1 2017-01-09 11:13:28  ...                   0.3     15.300000\n",
						"1         1 2017-01-09 11:32:27  ...                   0.3      7.250000\n",
						"2         1 2017-01-09 11:38:20  ...                   0.3      7.300000\n",
						"3         1 2017-01-09 11:52:13  ...                   0.3      8.500000\n",
						"4         2 2017-01-01 00:00:00  ...                   0.3     52.799999\n",
						"\n",
						"[5 rows x 17 columns]\n"
					]
				}
			],
			"source": [
				"df.head()"
			]
		},
		{
			"cell_type": "markdown",
			"metadata": {},
			"source": [
				"<div class=\"alert alert-block alert-warning\">\n",
				"To avoid incurring a charge, make sure to delete the Jupyter Notebook when you are done experimenting.\n",
				"</div>"
			]
		}
	],
	"metadata": {
		"kernelspec": {
			"display_name": "Glue PySpark",
			"language": "python",
			"name": "glue_pyspark"
		},
		"language_info": {
			"codemirror_mode": {
				"name": "python",
				"version": 3
			},
			"file_extension": ".py",
			"mimetype": "text/x-python",
			"name": "Python_Glue_Session",
			"pygments_lexer": "python3"
		}
	},
	"nbformat": 4,
	"nbformat_minor": 4
}
